Fact distribution in Information Extraction
نویسنده
چکیده
Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.
منابع مشابه
Stochastic Fuzzy Discrimination Information Measure Cost Function in Image Processing
A new cost function based on stochastic fuzzy discrimination information measure is introduced in this paper. Focusing on their significant parts, this cost function is used to find the optimal value of threshold for denoising image. It is, in fact, an extension of fuzzy entropy cost function by the present author. Multivariable normal distribution is used for creating focus on significant part...
متن کاملEnabling Public Access to Non-Open Access Biomedical Literature via Idea-Expression Dichotomy and Fact Extraction
The general public shows great potential for utilizing scientific research. For example, a singer discovered her ectopic pregnancy by looking up clinical case reports. However, an exorbitant paywall impedes the public’s access to scientific literature. Our case study on a social network demonstrates a growing need for non-open access publications, especially for biomedical literature. The chall...
متن کاملشناسایی خودکار سبک موسیقی
Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...
متن کاملA Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain
The BioLexicon is a standardised, reusable, lexical and conceptual resource suitable for advanced biomedical text mining. One of the unique features of the BioLexicon is the incorporation of rich syntactic and semantic patterns for a wide range of domain-relevant verbs, which have been acquired semiautomatically from biomedical corpora. Such types of information can be highly beneficial for inf...
متن کاملStochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions
The droplet size distribution in the column is usually represented as the average volume to surface area, known as the Sauter mean drop diameter. It is a key variable in the extraction column design. A study of the drop size distribution and Sauter-mean drop diameter for a liquid-liquid extraction column has been presented for a range of operating conditions and three different liquid-liquid sy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 40 شماره
صفحات -
تاریخ انتشار 2006